Model Tuning - Credit Card Users Churn Prediction Case Study

Background & Context

The Thera bank recently saw a steep decline in the number of users of their credit card, credit cards are a good source of income for banks because of different kinds of fees charged by the banks like annual fees, balance transfer fees, and cash advance fees, late payment fees, foreign transaction fees, and others. Some fees are charged to every user irrespective of usage, while others are charged under specified circumstances.

Customers’ leaving credit cards services would lead bank to loss, so the bank wants to analyze the data of customers and identify the customers who will leave their credit card services and reason for same – so that bank could improve upon those areas

Thera bank wants a classification model that will help the bank improve its services so that customers do not renounce their credit cards

We need to identify the best possible model that will give the required performance

Objective

  1. Explore and visualize the dataset.

  2. Build a classification model to predict if the customer is going to churn or not.

  3. Optimize the model using appropriate techniques.

  4. Generate a set of insights and recommendations that will help the bank.

Data Dictionary:

This dataset contains the information of the Thera Bank's customer data

Loading Libraries

Load Dataset

View the first and last 5 rows of the dataset

Understand the shape of the dataset

Observations:

Let us check for null values and duplicates

Observations:

Check the data types of the columns for the dataset

Observations:

Summary of the dataset

Observations:

Lets us look at different columns for Unique Data count

Observations:

Data Preprocessing - Feature Engineering

Dropping off the Client id column from the dataframe

Let us label encode our dependent variable and make the Attrition_flag column numeric

Observations:

Let us bin some columns into ranges for better EDA plotting and to minimize category buckets

Obsevations:

Datatype Conversions

Observations:

Univariate Analysis

Observations:

Observation on Customer Age

Observations:

Observations on Credit limit

Observations:

Observations on Total Revolving Balance

Observations:

Observations on Total Amount in Q1 vs Q4

Observations:

Observations on Total transaction count in Q1 vs Q4

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Bivariate Analysis

Observations:

Observations:

Let us check the relationship of the Dependent variable(Attrition Flag) vs Independent variables using bivariate plots

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Outlier Detection

Observations:

We will treat them all using Capping method.

Outlier Treatment

Lets check the Outlier Treatment

Observations:

Missing-Value Treatment

Observations:

Data Preparation for Modeling

Imputing Missing Values

Observations:

Let's inverse map the encoded values

Checking inverse mapped values/categories

Observations:

Creating Dummy Variables

Building the model

Model evaluation criterion:

Model can make wrong predictions as:

Predicting a customer is going to leave the bank , but the customer didnot attrite/is not leaving the bank - Loss of resources Predicting a customer is not going to leave the bank but the customer attrited/ is leaving the bank - Loss of opportunity

Which case is more important?

Predicting a customer is not going to leave the bank but the customer attrited/ is leaving the bank i.e. losing on a potential customer as the customer will not be targeted by the bank for any offers/promotional calls.

How to reduce this loss i.e need to reduce False Negatives?

Bank would want Recall to be maximized, greater the Recall lesser the chances of false negatives.

Let's evaluate all the 6 model performance by using KFold and cross_val_score for recall

Observations:

Observations:

Lets check the performance scores( accuracy , recall , precision and F1 score) for all the 6 models in training and validation

Model 1 - Bagging

Observations:

Model 2 - Random Forest

Observations:

Model 3 - Gradient Boost Model

Observations:

Model 4 - AdaBoost Model

Observations:

Model 5 - XGBoost Model

Observation:

Model 6 - Decision Tree

Observation:

Oversampling data using SMOTE

Let's evaluate all the 6 models oversampled performance by using KFold and cross_val_score for recall

Observations:

Observations:

The top three models in oversampled data giving the best recalls are XG Boost oversampled, Random Forest oversampled and GBM oversampled.

Model 1 - Bagging on oversampled data

Observations:

Model 2 - Random Forest on oversampled data

Observations:

Model 3 - Gradient Boost Model on oversampled data

Observations:

Model 4 - Ada Boost Model on oversampled data

Observations:

Model 5 - XG Boost Model on oversampled data

Observations:

Model 6 - Decision Tree on oversampled data

Observations:

Undersampling train data using Random Under Sampler

Let's evaluate all the 6 models undersampled performance by using KFold and cross_val_score for recall

Observations:

Observations:

Lets check all the 6 model unsampled performance on the training data set

Model 1 - Bagging on Undersampled data

Observations:

Model 2 - Random Forest on Undersampled data

Observations:

Model 3 - Gradient Boost Model on Undersampled data

Observations:

Model 4 - Ada Boost Model on Undersampled data

Observations:

Model 5 - XG Boost Model on Undersampled data

Observations:

Model 6 - Decision Tree on Undersampled data

Observations:

Comparing all 6 models performance in train and validation

Observations:

The top three models giving the best recall score in validation are Gradient Boost Undersampled, Ada Boost and XG Boost Undersampled.

Choosimg 3 models with reason to tune

Hyperparameter Tuning

AdaBoost Undersampled model Tuning

Observations:

XG Boost Undersampled Model Tuning

Observations:

Gradient Boost Undersampled Model Tuning with Init = 'Zero'

Observations:

Gradient Boost Undersampled Model Tuning with Init = 'AdaBoostClassifier'

Observations:

Hypertuned Model Comparison

Observations:

Performance of the best model on Test data - XGBoost Undersampled Tuned

Observations:

Feature Importances of Best Model - XGBoost UndersampledTuned

Observations:

Pipeline

Pipelines for productionizing the final model - XG Boost Undersampled Tuned

Column Transformer

We are doing missing value imputation for the whole data, so that if there is any missing value in the data in future that can be taken care of.
Now we already know the best model we need to process with, so we don't need to divide data into 3 parts